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Abstract 



An ancestry labeling scheme assigns labels (bit strings) to the nodes of rooted trees such that 
ancestry queries between any two nodes in a tree can be answered merely by looking at their 
corresponding labels. The quality of an ancestry labeling scheme is measured by its label size, 
that is the maximal number of bits in a label of a tree node. 

In addition to its theoretical appeal, the design of efficient ancestry labeling schemes is 
motivated by applications in web search engines. For this purpose, even small improvements in 
the label size are important. In fact, the literature about this topic is interested in the exact 
label size rather than just its order of magnitude. As a result, following the proposal of a 
simple interval-based ancestry scheme with label size 21og 2 n bits (Kannan et al., STOC '88), 
a considerable amount of work was devoted to improve the bound on the size of a label. The 
current state of the art upper bound is log 2 n + 0(\/log n) bits (Abiteboul et al., SODA '02) 
which is still far from the known log 2 n+il(loglogn) bits lower bound (Alstrup et al., SODA'03). 

In this paper we close the gap between the known lower and upper bounds, by constructing 
an ancestry labeling scheme with label size log 2 n + 0(loglogn) bits. In addition to the optimal 
label size, our scheme assigns the labels in linear time and can support any ancestry query in 
constant time. 



"This research is supported in part by the ANR projects ALADDIN and PROSE, and by the INRIA project 
GANG. 



1 Introduction 



1.1 Background 

In this paper we consider the following problem. Given an n-node rooted tree T, label the nodes of 
T in the most compact way such that given any pair of nodes u and v, one can determine whether 
u is an ancestor of v in T by merely inspecting the labels of u and v. The main quality measure 
used to evaluate such an ancestry labeling scheme is the label size, that is, the maximum number 
of bits stored in a label of a node, taken over all nodes in all possible n-node rooted trees. 

Among other things, the above elegant problem is not only of fundamental interest but is also 
useful for performance enhancement of XML search engines. In the context of this application, 
each indexed document is a tree, and the labels of all trees are maintained in the main memorjj^] 
Therefore, even small improvements in the label size are important, and, in fact, the literature 
about this topic is interested in the exact label size rather than just its order of magnitude (e.g., 
label size | log n bits is considered significantly better than label size 2 log n bitg^J . 

Ancestry schemes which are currently being used by actual systems are variants of the following 
simple interval-based ancestry labeling scheme (16] (see also [IE]). Given an n-node tree T, perform 
a DFS traversal in T starting at the root, and provide each node u with a DFS number dfs(u) 
in the range [0,n — 1]. Then the label of a node u is simply the interval I{u) = [dfs(u), dfs(u)], 
where v is the descendant of u with largest DFS number. An ancestry query then amounts to an 
interval containment query between the corresponding labels: a node u is an ancestor of a node v 
if and only if I(v) C I(u). Clearly, the label size of this scheme is 21ogn bits. 

An elegant lower bound of logn + O(loglogn) bits on the label size is given in [3J. This lower 
bound holds even for a very restricted family of trees, each composed of equal length simple paths 
hanging down from the root. 

In the other direction, a considerable amount of research has been devoted to improve the upper 
bound on the label size as much as possible beyond the trivial 2 log n bound [U |2j |6j El [T5l [2D] . 
Specifically, [2] gave a first non-trivial upper bound of | log n+O (log logn) bits. This was improved 
the year after to logn + 0(y/logn) [6], which is the current best upper bound (that scheme is 
described in detail in the joint journal publication [1]). In addition to its relatively small label 
size, the scheme in |6j also assigns labels in linear time and can answer any ancestry query in 
constant time. Independently of that work, an ancestry labeling scheme with larger label size of 
log n + 0(log nj log log n) was given in [20] . 

Following the above results, two other works were published, which focused on particular types 
of trees. Specifically, an experimental comparison of different ancestry labeling schemes on XML 
trees that appear in real life can be found in |15j . Recently, [8] gave an ancestry labeling scheme 
which is efficient for trees of small depth; specifically, for n-node trees with depth d, their scheme 
uses labels of size log n + 2 log d + O(l). 

1.2 Our results 

In this paper we close the gap between the known lower and upper bounds, by constructing an 
ancestry labeling scheme for general rooted n-node trees with label size logn + O (log logn). This 

1 Details on XML search engines and their relation to ancestry labeling schemes can be found, e.g., [Tl[3][5]. 
2 A11 logarithms in this paper are taken in base 2. 
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solves one of the main open problems in the field of informative labeling schemes. In addition 
to the optimal label size, our scheme assigns the labels to the nodes of a tree in linear time, and 
guarantees that any ancestry query can be answered in constant time. 

1.3 Related work 

As explained in [16] , the names of nodes in traditional graph representations reveal no information 
about the graph structure and hence memory is wasted. Moreover, typical representations are 
usually global in nature, i.e., in order to derive useful information, one must access a global data 
structure representing the entire network, even if the sought information is local, pertaining to only 
a few nodes. In contrast, the notion of informative labeling schemes, introduced in |16j . involves an 
informative method for assigning labels to nodes. Specifically, the assignment is made in a way that 
allows one to infer information regarding any two nodes directly from their labels, without using 
any additional information sources. Hence in essence, this method bases the entire representation 
on the set of labels alone. This method was illustrated in [16] , by giving two elegant and simple 
labeling schemes for n-node trees: one supporting adjacency queries and the other supporting 
ancestry queries. Both schemes incur 21ogn label size. 

As mentioned earlier, ancestry labeling schemes were further investigated in [TJ [3] |2j [6j |HJ IT5] [20] , 
and the current state of the art upper and lower bounds are log n+0(^/log n) and log n+J7(log log n) , 
respectively. Adjacency labeling schemes on trees were also further investigated in an attempt to 
optimize the label size beyond the simple 21ogn bound of |16j . The current state of the art upper 
bound |5 for that problem is logra + 0(log* n). 

Labeling schemes were also proposed for other decision problems on graphs, including distance 
[3 QUI [IH] j routing [7| Q21 120] , flow [H] [TT] , vertex connectivity [121 E] , nearest common ancestor 
[H [IT], and various other tree functions, such as center, separation level, and Steiner weight of a 
given subset of vertices [T7]. See [U] for a partial survey on labeling schemes. 

2 Preliminaries 

Let T be a tree rooted at some node r referred as the root of T. For two nodes u and v in T, we 
say that u is an ancestor of v if u ^ v and u is one of the nodes on the shortest path connecting v 
and r in T. For every non-root node u, let parent (u) denote the parent of u, i.e., the ancestor of u 
at distance 1 from it. A node v is a descendant of u if and only if u is an ancestor of v. 

The depth of a node u £ V(T) is defined as the distance from u to the root of T, i.e., the 
number of edge traversals from u to the root. In particular, the depth of the root is 0. The size 
of T, denoted by |T|, is the number of nodes in T. The weight of a node u G V(T), denoted by 
weight(n), is defined as 1 plus the number of descendants of u, i.e., weight(n) is the size of the 
subtree hanging down from u. In particular, the weight of the root is weight(r) = \T\. Let Tin) 
denote the family of all rooted trees of size at most n. 

An ancestry labeling scheme (M,T>) for the family of trees T{n) is composed of the following 
two components: 

1. A marker algorithm A4 that, given a tree T G T(n), assigns labels (i.e., bit strings) to its 
nodes. 
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2. A decoder algorithm T> that, given any two labels £\ and £2 in the output domain of Ai, 
returns a boolean T>{£\,£2). 

These components must satisfy that if L(u) and L{v) denote the labels assigned by the marker 
to two nodes u and v in some rooted tree T £ T(n), then 

T>(L(u),L(v)) = 1 <^=^ u is an ancestor of v in T. 

It is important to note that the decoder T> is independent of the tree T. That is, given the labels of 
two nodes, the decoder decides the ancestry relationship between the corresponding nodes without 
knowing to which tree in T(n) they belong to. 

The common complexity measure used to evaluate the quality of an ancestry labeling scheme 
( A4 , D) is the label size, that is the maximum number of bits in a label assigned by the marker 
algorithm M to any node in any tree T £ T(n). 

When considering the query time of the decoder, we use the RAM model of computation, 
and assume that the length of a computer word is rj(logre) bits. Similarly to [6], our decoder 
algorithm uses only the basic and fast RAM operations such as addition, substraction, left /right 
shifts and less-than comparisons. Our scheme avoids the sometimes more costly operations such 
as multiplication, division or non-standard operations which are pre-computed and stored in a 
pre-computed table. 

Notations. For every two nodes v and w in T, let P[v,w] denote the shortest path connecting v 
and w in the tree (including v and w), and let P[v,w) = P[v,w] \ {w}. 

For two integers a < 6, let [a, b] denote the set of integers {a, a + 1, • • • , b}. We refer to this set 
as an interval. For two intervals I = [a, b] and /' = [a', b'], we say that / -< I' if b < a'. The size of 
an interval I = [a,b], denoted by |/|, is the number of integers in /, i.e, \I\ = b — a + 1. 

3 Modifying the interval containment test 

Our scheme is inspired by the scheme in [81 which was designed for trees of bounded depth. Given 
a rooted tree T, the label assigned to each node by the scheme in [SJ is a pointer to some interval, 
and an ancestry query between any two given nodes is answered by a simple interval containment 
test between the corresponding intervals. The underlying idea of that scheme consists in proving 
that, if T is of small depth, then one can choose the intervals from a small set of intervals U in 
which the intervals are well nested within themselves. A pointer to an interval in U can be encoded 
using log \ U\ bits, and thus, since U is relatively small, the scheme uses short labels. Unfortunately, 
this approach is no longer efficient when the tree has long paths. Indeed, in that case, the set U of 
nested intervals becomes too large. 

Informally, enforcing the decoder to be merely an interval containment test imposes a strong 
constraint on the way the intervals must be organized in U. For arbitrary trees, we could not find 
a way to bypass this constraint while keeping U small. Instead, we introduce a decoder which, on 
the one hand, makes the ancestry test somewhat more complicated than when using the interval 
containment test (yet the test remains very simple), but, on the other hand, enables to organize 
and nest the intervals in such a way that labels become very small. Our new decoder exploits the 
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Figure 1: The heavy nodes are depicted in black, while the light nodes are depicted in white. 
In the figure, sup(f) = u and sup(u;) = w. We have parent(w) = v. Hence, the local quasi 
ancestors of w are the set of nodes is lqa(u;) = {x,y,a,b,c,d,e,v,w'} if dfs(w) > dfs(w/), and 
lqa(u;) = {x, y, a, b, c, d, e, v}, otherwise. Similarly, we have lqa(x) = if DFS(y) > dfs(x), and 
lqa(x) = {y} otherwise. 



fact that intervals can be partially ordered not only by the containment relation, but also by the 
relation -< introduced in the previous section. 

Given any node u of some rooted tree T, we associate u with an interval I(u), and with a 
supervisor node, denoted by sup(u), which is either u itself or one of its ancestors. For this purpose, 
we first mark each node as either heavy or light as follows. For every non-leaf node u of T, let H (u) 
be the set of children v of u that satisfy weight (v) > weight (w) for every child w of u. Among the 
nodes in H{u), select an arbitrary node, and call it heavy. A node which is not heavy is called light. 
(In particular, the root is light). 

For each node u £ T, define the supervisor of u, denoted by sup(n), as the light node of largest 
depth on the path P[u, r] connecting u to the root r. Note that if u is light then sup(u) is u itself; 
in particular, sup(r) = r, and sup(sup(n)) = sup(n). Observe also that if u is an ancestor of v, 
then either sup(u) = sup(-u), or sup(u) is an ancestor of sup(u). See Figure [Tj 

As we will show later, the basic rule of our decoder relies on the following definition which is a 
modification of the interval containment test used in several previous schemes. 

Definition 1 Let us consider a set of intervals {I(u), u £ V} for a tree T. We assume that all 
intervals in the set are distinct, i.e., I{u) ^ I{v) for any two distinct nodes u and v. We say that 
the decoding conditions hold at u w.r.t. v if and only if 

• Dl: I(v) C /(sup(n)), and 

• D2: I(u) -< I(v) or I(u) = /(sup(-u)). 
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Given the labels L(u) and L(y) of two nodes in a rooted tree, our boolean decoder T> outputs 1 
if and only if the decoding conditions hold at u w.r.t. v. Our marker algorithm will then guarantee 
the following: 

• The decoder is correct, i.e., the intervals associated with each node are selected such that, 
for any two nodes u and v, the decoding conditions hold at u w.r.t. v if and only if u is an 
ancestor of v; 

• given a tree T, the interval and labels can be assigned to all nodes in T in linear time; 

• given a label L(u), the intervals I(u) and /(sup(u)) can be computed in constant time; 

• each label is encoded using logra + O(loglogn) bits. 

4 The log n + 0(log log n) ancestry labeling scheme 

We are now ready to prove our main result, that is: 

Theorem 1 There is an ancestry labeling scheme for T (n) with label size [logn] +6 [log log n] +7 
and constant query time. Moreover, given a tree T, the labels can be assigned to the nodes of T in 
linear time. 

We prove Theorem [T] by constructing an ancestry labeling scheme (Ai , T>) with the desired 
properties. 

4.1 The marker algorithm Ai 

For simplicity of presentation assume that n is a power of 2, and let us fix a tree T S T(n). Our 
marker algorithm Ai first assigns an interval to each node in a way such that u is an ancestor of 
v if and only the decoding conditions hold at u w.r.t. v. For this purpose, we first show that it is 
sufficient to provide an assignment of intervals that satisfies a more "local" condition. 

4.1.1 The local partial order conditions 

Let us first assign numbers from to n — 1 to the nodes according to a DFS traversal that starts 
at the root, and visits light children first. We denote by dfs(u) the DFS number of u. Let 
P u = P [parent (it), sup(parent(-u)]. We define the local quasi- ancestors of u, denoted by lqa(-u), 
as all nodes in P u , together with their light children, but removing u, sup(parent(u)), and all 
nodes that have DFS numbers higher than u. See Figure [T] for an example. Note that the local 
quasi-ancestors of a node may not form a connected subtree of T. 

Definition 2 Let us consider a set of pairwise distinct intervals {I(u), u £ V} for a tree T. For 
every node u, we say that u satisfies the local partial order (lpo) conditions if the two conditions 
below are satisfied: 

• LPOi: I[u) C /(sup(u)) n 7(sup(parent(zt))) for every non root node u; 
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• lpo2: I{x) -< I{u) for every local quasi- ancestor x G lqa(u). 

Claim 1 // every node u satisfies the local partial order condition LPOi, then for any light node u, 
and any descendent v of u, we have I{v) C I{u). 

Proof. The claim is established by induction on the distance between u and v. If dist(u, v) = 1 
then the claim holds by LPOl. Assume that the claim holds for dist(u, v ) < d for d > 1, and assume 
dist(u, v) = d + 1. If sup(v) = u then the claim follows by LPOl. Thus assume sup(f) 7^ u. If 
sup(parent(f )) = u then again the claim follows by LPOl. Thus assume also sup(parent(i>)) 7^ u. 
In this case there exists a light node w ^ {u, v} on the shortest path connecting u to v. The claim 
then follows by induction. □ 

The following claim relates the decoding conditions to the local partial order conditions. 

Claim 2 If every node u satisfies the local partial order conditions LPOi and lpo2, then for every 
two different nodes u and v, u is an ancestor of v if and only if the decoding conditions Dl and T>2 
hold at u w.r.t. v. 

Proof. Assume that every node u satisfies the local partial order conditions. Consider first 
the case that u is an ancestor of v. Since v is a descendent of u, either sup(u) = sup(-u) or 
sup(v) is a descendent of sup(u). Thus, by Claim [T] I(v) C /(sup(v)) C /(sup(n)). Since v / u, 
I(v) C /(sup(u)), i.e., Dl follows. If u is light then the fact that u and v satisfy d2 follows trivially 
from the fact that, in this case, sup(u) = u. So assume now that u is heavy. If u G lqa(u), then D2 
follows from lpo2. Otherwise, if u ^ lqa(i>), then there exists a light node w that is an ancestor of 
v and such that u G lqa(u>). d2 follows by combining lpo2 with Claim [TJ 

Consider now the case that u is not an ancestor of v. We need to show that either Dl or d2 
does not hold. Let w be the light node of largest depth on the path P[v, r] which is an ancestor 
of u. If w is v itself then sup(u) is either v or a descendant of v. Therefore, by Claim [TJ we have 
/(sup(n)) C I(v), and thus Dl is not satisfied. In the remaining proof we thus assume that w ^ v. 

For a node x which is a descendant of w, and which satisfies sup(x) 7^ w, let f(x) be the light 
node of smallest depth on the path P[x,w). Assume first that sup(n) = w. If also sup(v) = w 
then v £ lqa(u), and thus, by lpo2, I(v) -< I(u). Since a / ro, we have I{u) / /(sup(u)), and 
therefore d2 is not satisfied. If, on the other hand, sup(w) 7^ w then we have f{v) G lqa(-u) and 
I{v) C /(/(«)). Similarly to the previous case, this implies that d2 is not satisfied. 

Assume now that sup(u) 7^ w. We have /(sup(u)) C I(f(u)). If v is an ancestor of f(u) then 
v G lqa(/(u)). Consequently, I(v) -< I(f(u)) and thus Dl does not hold. On the other hand, if v is 
not an ancestor of f(u) then we are left with two cases: sup(w) = w and sup(w) / w. If sup(?;) = w 
then I(f(u)) -< I(v) since f{u) G lqa(t> ). Thus Dl does not hold. Finally, if sup(t>) 7^ w then either 
f{u) G lqa(/(u)) or f{v) G lqa(/(t*)). Since /(sup(u)) C /(/(«)) and I(y) C /(/(«)), it follows that 
Dl does not hold. □ 

4.1.2 The interval assignment 

By Claim [2] one of our goals is to let the marker assign intervals that satisfy the local partial order 
conditions at each node. For integers a, b and k, let 

I k , a , b = [2 k a, 2 k (a + b)}. 
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For k G [ljlogra], define the set of intervals: 

A.TL log Ti 

Ik = {h,a,b | i S [l,k],a G [1, — — ], and b G [l,41ogn]}. 

Let X — -Mogfi' 

Definition 3 Let T G X(n) . We say that a mapping I : V — > X is a legal interval-mapping j/ f/ie 
mapping is one-to-one, and {I(u),u G y} satisfies the local partial order conditions at each node 
ofT. 

In order to show that there exists a legal interval-mapping from every tree in T(n) into X, we 
use the following notation. For any interval J C [l,nlogn], and, for any k, 1 < k < logn, let 

Zk(J) = {h,a,b £ X fc | I i>ab C J} . 

Lemma 1 For every k G [l,logn], every tree T G X(2 fe ), and every interval J C [l,4nlogn], suc/i 
t/iat |J| = 4/c|T|, t/iere exists an legai interval-mapping of T into Xfc(J). Moreover this mapping 
can be computed in 0{\T\) time. 

Proof. We prove the lemma by induction on k. For k = 1, the lemma holds trivially. Assume 
now that the claim holds for k with 1 < k < logn, and let us show that it also holds for k + 1. 
Clearly, if \T\ < 2 k then we are done by induction. 

Consider now the case where T is of size 2 k < \T\ < 2 fc+1 , and let J C [1, 4n log n] be an interval, 
such that | J | = 4(fc + 1)|T|. Our goal is to show that there exists a legal-interval mapping of T into 
Xfc+i(J). 

We make use of the following decomposition of T. Let H be the path from the root of T to a 
leaf of T such that every non-root node in H is heavy. Let v%, 1)2, ■ ■ ■ , Vd be the nodes of H, ordered 
top-down, i.e., v\ is the root of T, Vd is a leaf of T, and for every 1 < i < d, Vi is the parent of Uj+i. 

For every 1 < i < d, let , Tf , • • • , T* 1 be the rooted trees hanging down from the light children 
of Vi. (If Vi does not have any light child, which is the case, for example, for i = d, then this set of 
trees is empty, or, in other words, U = 0). One important property of these trees is that, for every 
i and j, l<j< U, we have |T/| < \T\/2 < 2 k . 

We now group the nodes in T \ {r} in disjoint trees Ti, T2, • • • , T m , where m = (d— 1) + Yle=i ^ 
as follows. A tree Tj is either a single heavy node Vj, for j > 1, or a subtree hanging from a light 
child of some Vj, j > 1. Moreover, the trees are enumerated according to the DFS numbers of their 
roots as follows. Recall dfs(u) denotes the DFS number of node u in T. The trees are ordered such 
that if rj denotes the root of Tj then DFS(rj) < DFS(r J+ i) for all j = 1, . . . , m — 1. See Figure [2j 

Consider now the interval J C [l,4nlogn] such that \J\ = 4(k + 1)|T|, and express it as 
J = [a, a + 4(k + 1)|T| — 1] for some integer a < 4nlogn. Let a be the smallest integer such that 
a < a2 fc+1 , and let b be the smallest integer such that 4k\T\ < b2 k+1 . 

First, we assign the root r to the interval J' = [a2 fc+1 , (a + b)2 k+1 ]. We now show that indeed 
J' G Tfc + i(J). By definition of a and b, we have 

(a + b)2 k+1 = 2 k+2 + ((a - 1) + (b - l))2 k+l < 2 k+2 + a + 4k\T\ - 2. 
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Figure 2: Tree decomposition as in the proof of Lemma [Tj In the figure, the DFS traversal that 
visits light nodes first is supposed to proceed by visiting children from left to right. 



Since 2 k < |T|, we get that 

(a + b)2 k+1 <a + (4k + 4)|T| - 1. 

Thus 

J' = [a2 k , (a + b)2 k+1 ] C [a, a + (4k + 4)\T\ - 1] = J. 

Therefore, since 1 < a < a/2 h+1 < 4nlogn/2 fc+1 , and 1 < b < 41ogn, we obtain that J' G 2k+i(J). 

The rest of the nodes in T are mapped as follows. First note that | J'\ > 4/c|T|, and recall that 
Y^Li 1-^*1 = 1-^1 ~~ 1- We break J' into m + 1 consecutive intervals J[, J' 2 , ■ ■ ■ J' m+ i such that, for 
each 1 < i < m, we have | J-\ = 4k\Ti\ and J- -< J' i+ i- For each 1 < i < m, since |Tj| < 2 k , we can 
use the induction hypothesis to map the nodes in Tj to Ik(Ji) via a legal interval-mapping. 

The fact that the above recursive mapping can be performed in linear time is obvious. It 
remains to show that the above mapping of T into Zk+i(J) is indeed a legal interval-mapping. 
That is, we have to show that the set of intervals satisfies the local partial order conditions LPOl 
and lpo2 at each node u. The conditions hold trivially at the root r of T. The fact that the 
conditions hold for every node u in Tj \rj, follows from the induction hypothesis, and because both 
sup(u), sup(parent(n)), and lqa(u) are all contained in Tj. Finally, consider the root ri of Tj. (Note 
that if rj is heavy then T = {r{\). LPOl holds trivially for rj because J' contains Jj, and, by 
induction, the interval assigned to rj is contained in the interval J[ C J'. To establish that lpo2 
holds, first observe that lqa(rj) = {n, . . . ,r^i}. On the other hand, for every j = 1, . . . ,i — 1, 
DFS(rj) < DFS(rj), and thus Jj -< J[. Hence lpo2 holds for rj as well. Our mapping is thus a legal 
interval-mapping of T into Ik+\{J)- This completes the proof of the lemma. □ 

By taking k = log n in the above lemma, we obtain the following. 

Corollary 1 LetT £ T(n). There exists a legal interval-mapping ofT into Ti ogn ([l,4n log n]) C T. 
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4.1.3 The label assignment 



We are now ready to describe the label L(u) assigned to every node u by our marker algorithm A4. 
Given a rooted tree T G T{n), the marker Ai first marks each node as either heavy or light, and 
then assigns the dfs numbers. This clearly takes linear time. Then the marker maps the nodes of 
T into I using the legal-interval mapping given in Corollary [TJ Again, this step takes linear time. 

Given a node u, the marker uses the first logn + 3 [log log n] + 3 least significant bits of L(u) 
to encode the interval I(u). This can be done explicitly as I(u) is of the form I^ab f° r some 
i G [l.logn], a G and 6G [l,41ogn]. 

Finally, the marker aims at encoding J(sup(u)) in the label of u. However, using the method 
above to encode J(sup(tt)) would consume yet another log n+3 [log log n] +3 bits, which is obviously 
undesired. Instead, we use the following trick. Let and b 1 be such that /(sup(ii)) — Ii',a',b'- 

(Note that if sup(u) = u then we simply have i! = i, a' = a and b' = b). Clearly, 2 [log log n \ +2 bits 
suffice to encode both i' and b' . To encode a', the marker acts as follows. Let I(u) = [a, (3\ , and let a" 
be the largest integer such that 2* a" < a. Recall that by definition J(sup(u)) = [2* a', 2* (a' + b')]. 
We have, a" — 41ogn < a" — b' because b' < 41ogn. Since I(u) C /(sup(u)), we also have 
2* (a' + b') > (3 > a > 2* a". Thus a" - b' < a'. Finally, again since I(u) C /(sup(u)), we have 
2 l a' < a, and thus a" > a' . Combining the above inequalities, we get that a' G [a" — 4 log n — l,a"]. 
The marker now encodes the integer t G [0,41ogn — 1] such that a' = a" — t. This is done in 
consuming another [log log n] + 2 bits. Hence, the following follows by construction: 

Lemma 2 Given a tree T G T(n), the marker M assigns labels to the nodes of T in linear time, 
and each label is encoded using log n + 6 [log log n\ + 7 bits. 

4.2 The decoder V 

Now, we describe our decoder T>. Given the labels L{u) and L[v) assigned by Ai to two different 
nodes in some tree T, the decoder T> needs to find whether u is an ancestor of v in T. (Observe 
that since each node receives a distinct label, the decoder can easily find out if u and v are in fact 
the same node, and, in this trivial case, it simply outputs 0.) 

The decoder inspects the first log n + 3 [log log n] +3 least significant bits of L(u) to extract I(u) 
(recall that I(v) = Ii ja ,b 1S encoded by storing explicitly the three parameters i, a, and b). Then, once 
I(u) = [a, f3] has been reconstructed from L(u), the decoder aims at extracting /(sup(w)). For this 
purpose, it first reconstructs i' and b' that have been explicitly encoded in the next 2 [log log n \ +2 
bits. Then, it computes the largest integer a" such that 2 l a" < a. The decoder then proceeds 
by extracting t, and computes a' = a" — t. At this point, V have reconstructed both I[u) and 
/(sup(it)). Similarly, V extracts I(v) by inspecting L[v). 

Finally, the boolean decoder T> outputs 1 if and only if the two decoding conditions Dl and D2 
hold at it w.r.t. v (see Definition [T]) . 

Lemma 3 Let L(u) and L(y) be two labels assigned by Ai to two nodes in T. The decoder 
T){L{u),L{v)) performs in constant time, and satisfies T){L{u),L{v)) = 1 if and only if u is an 
ancestor of v in T. 

Proof. The fact that D(L(u), L(v)) = 1 if and only if u is an ancestor of v in T follows from the 
fact that the intervals are assigned by the marker via a legal-interval mapping (cf . Corollary nl . 
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Since I (it) = I^ a ,b = [2 l a, 2 l (a + b)} with all three parameters i, a, and 6 stored explicitly, computing 
I(u) from L(u) can be achieved in constant time. (Note that 2 l a, for example, can be obtained 
from a but a simple shift of i bits.) Similarly, I(v) can be extracted from L(v) in constant time. 
Computing /(sup(u)) just needs a simple substraction and a division by a power of 2, which again 
amounts to a simple shift operation. The lemma follows. □ 

This completes the proof of Theorem [TJ 



5 Conclusion 

Our ancestry labeling scheme is using labels of optimal size log 2 n + 0(log log n) bits, to the price of 
a decoding mechanism based of an interval condition slightly more complex than the simple interval 
containment condition. Although this has no impact on the decoding time (our decoder still works 
in constant time) , the question of whether there exists an ancestry labeling scheme with labels of 
size \og 2 n + O(loglogn) bits, but using solely the interval containment condition, is intriguing. 

Acknowledgments: the authors are very thankful to Sundar Vishwanathan and Jean-Sebastien 
Sereni for helpful discussions. 
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